Exposure database#

This notebook provides a systematic approach for analyzing and visualizing infrastructure networks across different European countries. It iterates through a list of countries and network types, retrieves the relevant geographic data, processes it to handle missing values, and generates visualizations to provide insights into the infrastructure distribution and characteristics within each country. Raw geospatial data is sourced from multiple databases, including OSM. Integration with OSM data is achieved using the OSMnx Python library, which simplifies downloading, modeling, analyzing, and visualizing OSM data. The integrated data is associated with EU codes representing all European countries (e.g., FR for France, NL for Netherlands). Geometrical representations, such as lines for roads, points for schools or substations, and polygons for areas like educational districts or power grids, are processed. Finally, the output is stored in vector Geoparquet files, named according to the network code, geometry type, and EU code (e.g., Network_code_geometry_EU-Code.parquet).

Hide code cell source

# HIDE CODE
# Import necessary libraries
import gc
import logging
import os
import shutil
import warnings
from pathlib import Path

import geopandas as gpd
import matplotlib.pyplot as plt
import numpy as np
import osmnx as ox
import pandas as pd
from tqdm import tqdm
import folium
from IPython.display import display

# Set logging level
ox.settings.log_console = False
logging.getLogger("osmnx").setLevel(logging.ERROR)
# Ignore all warnings
warnings.filterwarnings("ignore")

import geopandas as gpd
from IPython.display import display
# Path to the tmp_cache directory
cache_dir = os.path.join(os.getcwd(), "tmp_cache")

# Create the directory if it doesn't exist
os.makedirs(cache_dir, exist_ok=True)

# Example of saving a file in the cache directory
with open(os.path.join(cache_dir, "cache_file.txt"), "w") as file:
    file.write("This is a cached file.")
# Define the folder where the database will be stored
# TODO: We could set a .miraca folder as a default for all packages
Miraca_Exposure_Database_path = Path("~/miraca_exposure_database").expanduser()

This dictionary maps the names of European countries to their respective ISO 3166-1 alpha-2 codes. These codes are two-letter country codes defined in the ISO 3166-1 standard.The dictionary keys are country names (str) and values are their ISO 3166-1 alpha-2 codes (str).

EU_code = {
    "Albania": "AL",
    "Austria": "AT",
    "Belgium": "BE",
    "Bulgaria": "BG",
    "Croatia": "HR",
    "Cyprus": "CY",
    "Czechia": "CZ",
    "Denmark": "DK",
    "Estonia": "EE",
    "Finland": "FI",
    "France": "FR",
    "Germany": "DE",
    "Greece": "GR",
    "Hungary": "HU",
    "Iceland": "IS",
    "Ireland": "IE",
    "Italy": "IT",
    "Latvia": "LV",
    "Liechtenstein": "LI",
    "Lithuania": "LT",
    "Luxembourg": "LU",
    "Malta": "MT",
    "Montenegro": "ME",
    "Netherlands": "NL",
    "North_Macedonia": "MK",
    "Norway": "NO",
    "Poland": "PL",
    "Portugal": "PT",
    "Romania": "RO",
    "Serbia": "RS",
    "Slovakia": "SK",
    "Slovenia": "SI",
    "Spain": "ES",
    "Sweden": "SE",
    "Switzerland": "CH",
    "United_Kingdom": "UK",
    "Svalbard_and_Jan_Mayen": "SJ",
}

This dictionary maps the names of European countries to the cost from GEM. The dictionary keys are country names (str) and values are the GEM costs (float).

Gem_cost = {
    "Austria": 1454.26,
    "Belgium": 1349.35,
    "Bulgaria": 530.44,
    "Cyprus": 1221.97,
    "Czech_Republic": 1167.68,
    "Denmark": 2629.95,
    "Estonia": 953.19,
    "Finland": 2005.17,
    "France": 2225.00,
    "Germany": 1869.00,
    "Greece": 954.97,
    "Hungary": 534.89,
    "Ireland": 1640.27,
    "Italy": 1424.00,
    "Latvia": 937.17,
    "Lithuania": 962.09,
    "Luxembourg": 1418.66,
    "Malta": 931.83,
    "Netherlands": 1401.43,
    "Poland": 1440.02,
    "Portugal": 801.00,
    "Romania": 775.19,
    "Slovakia": 849.95,
    "Slovenia": 1110.72,
    "Spain": 747.60,
    "Sweden": 3528.85,
}
def map_network_to_code(network):
    """
    This function maps network names to shorter codes according to a predefined mapping.

    Arguments:
    network (str): The network name to be mapped.

    Returns:
    str: The corresponding code for the network, or the original network if not found in the mapping.
    """

    network_mapping = [
        ("Education", "Edu"),
        ("Healthcare", "Health"),
        ("Airports", "Airport"),
        ("Ports", "Port"),
        ("Roadway", "Road"),
        ("Railway", "Rail"),
        ("Telecommunication", "Telecom"),
        ("Electricity", "Elec"),
    ]

    for network_name, network_code in network_mapping:
        if network == network_name:
            return network_code
    return network
def get_network_tags(network):
    """
    This function takes an network type and returns the corresponding OpenStreetMap tags.

    Parameters:
    network (str): The type of network for which OSM tags are required.

    Returns:
    dict: A dictionary containing OSM tags associated with the specified network type.
    Returns an empty dictionary if the network type is not found.
    """

    network_tags = {
        "Education": {"amenity": "school"},
        "Healthcare": {"amenity": "hospital"},
        "Airports": {"aeroway": "aerodrome"},
        "Ports": {"waterway": "harbour", "landuse": "harbour", "man_made": "pier"},
        "Roadway": {
            "highway": [
                "motorway",
                "primary",
                "secondary",
                "tertiary",
                "motorway_link",
                "primary_link",
                "secondary_link",
                "tertiary_link",
            ]
        },
        "Railway": {"railway": ["rail", "station"]},
        "Telecommunication": {"man_made": ["communications_tower", "mast"]},
        "Electricity": {
            "power": [
                "cable",
                "plant",
                "pole",
                "substation",
                "tower",
                "line",
                "minor_line",
            ]
        },
    }
    return network_tags.get(network)
def get_network_columns(network):
    """
    This function returns the list of columns associated with a particular network type.

    Parameters:
    network (str): The type of network for which column names are required.

    Returns:
    list: A list of column names associated with the specified network type.
    """
    network_columns = {
        "Education": [
            "name",
            "geometry",
            "amenity",
            "building:levels",
            "material",
            "height",
        ],
        "Healthcare": [
            "name",
            "geometry",
            "amenity",
            "emergency",
            "building:levels",
            "height",
            "beds",
        ],
        "Airports": [
            "name",
            "geometry",
            "aeroway",
            "ref",
            "int_name",
            "ele",
            "operator",
            "surface",
            "iata",
            "icao",
            "passengers",
            "landuse",
            "layer",
        ],
        "Ports": [
            "name",
            "geometry",
            "waterway",
            "man_made",
            "landuse",
            "ele",
            "surface",
            "tracktype",
            "material",
            "incline",
            "layer",
            "smoothness",
            "bridge",
            "oneway",
            "maxspeed",
            "maxweight",
        ],
        "Roadway": [
            "geometry",
            "highway",
            "name",
            "int_name",
            "int_ref",
            "maxspeed",
            "lanes",
            "oneway",
            "bridge",
            "sidewalk",
            "surface",
            "smoothness",
            "maxheight",
            "tunnel",
            "layer",
        ],
        "Railway": [
            "name",
            "geometry",
            "maxspeed",
            "voltage",
            "bridge",
            "layer",
            "tunnel",
            "cutting",
            "electrified",
            "usage",
            "embankment",
            "frequency",
            "tracks",
            "gauge",
        ],
        "Telecommunication": [
            "geometry",
            "man_made",
            "tower_type",
            "mast_type",
            "height",
            "ele",
            "material",
        ],
        "Electricity": [
            "name",
            "geometry",
            "power",
            "cables",
            "voltage",
            "location",
            "frequency",
            "type",
            "wires",
            "layer",
            "disused",
            "operator",
            "circuits",
            "usage",
        ],
    }
    return network_columns.get(network)
def replace_nan_values(gdf, column_name, value):
    """
    This function replaces NaN values in a specified column of a GeoDataFrame.

    Parameters:
    gdf (GeoDataFrame): The GeoDataFrame to process.
    column_name (str): The name of the column in which to replace NaN values.
    value (str): The value to use as a replacement.

    Returns:
    GeoDataFrame: The processed GeoDataFrame.
    """
    gdf[column_name] = gdf[column_name].fillna(value)
    return gdf

Select, download and save geospatial data using osmnx#

def select_data_osmnx(country, network, database):
    """
    Fetches and processes geospatial data for a specific network type in a given country using OSMnx.

    Parameters:
    country (str): The name of the country for which data is being fetched.
    network (str): The type of network for which data is being fetched.
    database (str): The name of the database.

    Returns:
    GeoDataFrame or None: A GeoDataFrame containing the processed data if successful,
                          None if there was an error fetching the data.

    Raises:
    Exception: Re-raises any unexpected exceptions encountered during data retrieval.
    """

    try:
        # Get the abbreviation for the country
        country_abbr = EU_code.get(country, country)

        # Get the tags for the given network type
        tags = get_network_tags(network)

        if tags is None:
            print(f"No tags defined for network type '{network}'")
            return

        network_code = map_network_to_code(network)
        features = ox.features_from_place(country, tags=tags)

        # Define a dictionary that maps each network to the preferred geometry types
        network_geometry = {
            "Education": ["Point", "Polygon"],
            "Healthcare": ["Point", "Polygon"],
            "Airports": ["Point", "Polygon"],
            "Ports": ["Point", "Polygon"],
            "Roadway": ["LineString"],
            'Railway': ['LineString', 'Point'],
            "Telecommunication": ["Point"],
            "Electricity": ["Point", "LineString"],
        }
        preferred_geometries = network_geometry.get(network)

        if preferred_geometries is None:
            print(f"No preferred geometries defined for network type '{network}'")
            return

        # Initialize gdf as an empty GeoDataFrame
        gdf = gpd.GeoDataFrame()
        # Initialize an empty list to hold each GeoDataFrame for different geometries
        gdf_list = []

        # Define colors for each geometry type
        geometry_colors = {
            "Point": "#ED5861",        # Red for points
            "LineString": "#4069F6", # Black for lines
            "Polygon": "#4069F6",    # Green for polygons
        }
        
        # Keep only the features with the preferred geometries
        for geom_type in preferred_geometries:
            geom_features = features[features["geometry"].geom_type == geom_type]
            geom_features = geom_features.dropna(subset=["geometry"])

            if "geometry" not in geom_features.columns:
                geom_features["geometry"] = None

            columns = get_network_columns(network)

            # Explicitly create GeoDataFrame
            gdf = gpd.GeoDataFrame(geom_features, columns=columns)

            # Add columns for country name and abbreviation
            gdf["Country"] = country
            gdf["Country_code"] = country_abbr
            gdf["Geometry"] = geom_type

            # Set the CRS for the GeoDataFrame
            gdf.crs = "EPSG:4326"

            if network in ["Roadway", "Railway"]:
                # Call the function to replace NaN values
                gdf = replace_nan_values(gdf, "bridge", "no")
                gdf = replace_nan_values(gdf, "tunnel", "no")
                gdf = replace_nan_values(gdf, "layer", 0)
                # Convert 'layer' column to numeric, handling errors by setting them to NaN
                gdf["layer"] = pd.to_numeric(gdf["layer"], errors="coerce")

            if network == "Education" and country in Gem_cost:
                gdf["Cost_per_area"] = "{:.2f}".format(Gem_cost.get(country, np.nan))

            # Save to Geoparquet
            output_directory = os.path.join(
                Miraca_Exposure_Database_path, network, database, country_abbr
            )
            os.makedirs(output_directory, exist_ok=True)
            geom_type_to_letter = {
                "Point": "P",
                "LineString": "L",
                "Polygon": "Α",
                # Add more geometry types as needed
            }
            # Get the specific letter for the current geometry type
            geom_type_letter = geom_type_to_letter.get(geom_type, "")

            output_filename = os.path.join(
                output_directory,
                f"{network_code}{geom_type_letter}_{country_abbr}.parquet",
            )
            gdf.to_parquet(output_filename, engine="pyarrow", compression="snappy")
    
            # Append the current GeoDataFrame to the list
            gdf_list.append(gdf)

        # Combine all GeoDataFrames into a single one
        final_gdf = gpd.GeoDataFrame(pd.concat(gdf_list, ignore_index=True))

        # Color the features based on their geometry type
        # Add a 'color' column to the GeoDataFrame
        final_gdf['color'] = final_gdf['Geometry'].map(geometry_colors)

        # Display the final GeoDataFrame with explore() (interactive map)
        display(final_gdf.explore(
        color=final_gdf['color'],
        marker_kwds={
            "radius": 4,         # Point size
            "fillOpacity": 0,    # Fill opacity for points
            "opacity": 1         # Stroke opacity for points
        },
        style_kwds={
            "weight": 3,         # Line width
            "opacity": 1         # Line opacity (1 = fully opaque)
        }
        ))


        return gdf
        
    except Exception as e:
        error_message = str(e)
        if "No data elements in server response" in error_message:
            print(f"Error fetching data for {network}: {error_message}")
            return None
        else:
            raise

Working with street network using osmnx#

def get_street_network(country, network, database):
    """
    Download, model, and save a street network for a specified country and network type.

    Parameters:
    country (str): The name of the country for which the street network is to be modeled.
    network (str): The type of network for which the street network is to be modeled.
    database (str): The name of the database where the output files will be saved.

    """
    # Get the abbreviation for the country
    country_abbr = EU_code.get(country, country)

    # Get the tags for the given network type
    tags = get_network_tags(network)

    if tags is None:
        print(f"No tags defined for network type '{network}'")
        return

    network_code = map_network_to_code(network)

    # Define the types of roadways you want to include
    road_types = [
        "motorway",
        "primary",
        "secondary",
        "tertiary",
        "trunk",
        "motorway_link",
        "primary_link",
        "secondary_link",
        "tertiary_link",
        "trunk_link",
    ]

    # Download/model a street network for the specified country, including only specified road types
    features = ox.graph_from_place(
        country,
        network_type="drive",
        custom_filter='["highway"~"{}"]'.format("|".join(road_types)),
        retain_all=True,
        simplify=True,
        truncate_by_edge=True,
    )

    # explore nodes and edges together in a single map
    gdf_nodes, gdf_edges = ox.graph_to_gdfs(features)
    
    # Save graph as a GeoPackage
    output_directory = os.path.join(
        Miraca_Exposure_Database_path, network, database, country_abbr
    )
    os.makedirs(output_directory, exist_ok=True)
    output_filename = os.path.join(
        output_directory, f"{network_code}_{country_abbr}.gpkg"
    )
    ox.save_graph_geopackage(features, output_filename)
    
    # Read the edges using GeoPandas
    gdf_L = gpd.read_file(output_filename, layer="edges")
    # Add columns for country name and abbreviation
    gdf_L["Country"] = country
    gdf_L["Country_code"] = country_abbr
    gdf_L["Geometry"] = "Linestring"
    
    # Define the path for the output GeoParquet file
    geoparquet_path = os.path.join(
        output_directory, f"{network_code}L_{country_abbr}.parquet"
    )
    # Write the GeoDataFrame to GeoParquet format
    gdf_L.to_parquet(geoparquet_path, index=None)

    # Read the nodes using GeoPandas
    gdf_P = gpd.read_file(output_filename, layer="nodes")
    # Add columns for country name and abbreviation
    gdf_P["Country"] = country
    gdf_P["Country_code"] = country_abbr
    gdf_P["Geometry"] = "Point"
    # Define the path for the output GeoParquet file
    geoparquet_path = os.path.join(
        output_directory, f"{network_code}P_{country_abbr}.parquet"
    )
    # Write the GeoDataFrame to GeoParquet format
    gdf_P.to_parquet(geoparquet_path, index=None)


    # Automatically delete the GeoPackage file after saving Parquet files
    if os.path.exists(output_filename):
        os.remove(output_filename)
    else:
        print("GeoPackage file not found, nothing to delete.")
    
    # Create base map with lines (edges)
    m = gdf_L.explore(
        color="black",
        tiles="OpenStreetMap",
        style_kwds={
            "weight": 3,       # Line thickness
            "opacity": 1       # Full opacity (less transparent)
        }
    )
    
    # Add points (nodes) to the map
    gdf_P.explore(
        m=m,
        color="#ED5861",
        marker_kwds={
            "radius": 3,       # Point size
            "fillOpacity": 1,  # Fill transparency
            "opacity": 1       # Border transparency
        }
    )

    # Display the map
    display(m)
# List of networks
network_list = [
    "Education",
    "Healthcare",
    "Airports",
    "Ports",
    "Railway",
    "Telecommunication",
    "Electricity",
]

Specify the network and country for which you would like to run the code

network_type = "Education"
country = "Cyprus"
print(f"Running for {network_type} in {country}")
gdf = select_data_osmnx(country, network_type, "OSM")
Running for Education in Cyprus
Make this Notebook Trusted to load map: File -> Trust Notebook
street_network = get_street_network(EU_code[country], "Roadway", "OSM")
C:\Users\Paraskevi Tsoumani\anaconda3\envs\miraca\Lib\site-packages\osmnx\_overpass.py:254: UserWarning: This area is 11 times your configured Overpass max query area size. It will automatically be divided up into multiple sub-queries accordingly. This may take a long time.
  multi_poly_proj = utils_geo._consolidate_subdivide_geometry(poly_proj)
Make this Notebook Trusted to load map: File -> Trust Notebook